Cluster problems
Monitor the status of your runtime cluster and runtime cloud nodes using the Cluster Status panel on the Runtime Management page (Manage > Runtime Management).
- Click the Cluster Issues tab on the Cluster Status panel, to view cluster problem reports. If that tab is not present, there are not currently any reported cluster problems.
The following table shows information about all of the different types of reportable cluster problems.
note
Cluster problem reports, including a node’s “problem” property values, are written to the node's node.localhostid.dat file (also known as the “view snapshot” file). The values of “problem” properties also appear in the container log file.
| Problem property value | Importance | Description | Resolution/Troubleshooting |
|---|---|---|---|
| CLUSTER_STATE_MISMATCH | Warning | Different nodes in the cluster have differing cluster state information. | Restart the problem node if issue persists. |
| CONTAINER_VERSION_MISMATCH | Warning | There are differing container versions (build numbers) across the various view snapshots. | Restart all nodes if issue persists. |
| DIFFERENT_NODES | Warning | Two views which otherwise seem to agree (view ID and head node) do not have all the same nodes. | Check for cluster communication problems if issue persists. |
| HEAD_AWOL | Severe error | The head node, according to the live view, does not have a corresponding view snapshot. | Check for network issues and communication problems. |
| HEAD_SUSPECT | Warning | Like HEAD_AWOL, but some nodes are still starting up so this could be a timing issue. | Wait for nodes to finish starting up. |
| HEAD_SUSPECT_ESCALATED | Severe error | The HEAD_SUSPECT warning is escalated to a severe warning. | If issue persists, node is forcefully deleted. |
| LOCALHOSTID_CONFLICT | Severe error | Multiple nodes are writing to the same view snapshot file, indicating a conflict in localHostId. | Ensure nodes have unique localHostIds. |
| MINIMUM_CLUSTER_SIZE | Severe error | A node is waiting to restart but the cluster has reached its minimum allowable size. | Wait for the number of active nodes in the cluster to increase. |
| MULTIPLE_HEAD_NODES | Severe error | There is more than one head node in the various view snapshot files. | Check for network and configuration issues. |
| NODE_AWOL | Severe error | One or more nodes in the live view do not have a corresponding view snapshot. | Similar to the HEAD_AWOL problem, except that the missing nodes are not the head node. |
| NODE_DOWN | Warning | A node is either not running and did not remove its view file, or is hanging and no longer updating its view file. | Remove offending file or restart node. |
| NODE_SUSPECT Warning | Like NODE_AWOL, but some nodes are still starting up so this could be a timing issue. | Wait for nodes to finish starting up. | |
| ORPHANED_NODE | Severe error | The head node's view snapshot does not include this node. | Check for network and communication issues. |
| READ_FAILURE | Warning | Could not read a view snapshot file. | Check file system and container logs. |
| ROLLING_RESTART_* | Warning | Includes ROLLING_RESTART_MULTIPLE_HEAD_NODES, ROLLING_RESTART_VIEW_ID_MISMATCH, ROLLING_RESTART_VIEW_FILE_MISMATCH, ROLLING_RESTART_JAVA_HOME_MISMATCH,ROLLING_RESTART_ORPHANED_NODE, ROLLING_RESTART_HEAD_AWOL, and ROLLING_RESTART_NODE_AWOL. These issues are generally considered severe but are downgraded to warnings during a rolling restart. | These issues can generally be ignored while a rolling restart is in progress as they are often transient. If they persist after the restart is complete, then they should be investigated as they would be if they appeared outside of a rolling restart. |
| UNEXPECTED_HEAD_NODE_CHANGE | Severe error | One of the nodes changed its head node to a different one without receiving a head node change notification. | Check for network latency or partition issues. |
| VIEW_ID_CONFLICT | Severe error | Two different nodes have the same view ID. | Check network settings and view files. |
| VIEW_ID_MISMATCH | Warning | The head node has a view ID that does not match other nodes' view IDs. | Check for network issues and consider restarting nodes. |
| WRITE_FAILURE | Severe error | Could not write to a view snapshot file. | Check file system and container logs. |
For more information on Cluster monitoring refer to the following links: